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ABSTRACT 

In  this  study,  we  present  an  algorithm  for  system  identification  for  systolic  array 
implementation.  With  this  schema,  discrete  samples  of  input  and  output  data  of  a  sys- 
tem with  uncertain  characteristics  are  used  to  determine  the  parameters  of  its  model. 
The  identification  algorithm  is  based  on  recursive  least  squares,  QR  decomposition,  and 
block  processing  techniques  with  covariance  resetting.  The  identification  process  is 
based  on  the  use  of  Givens  rotation.  Additionally,  we  want  to  address  the  following 
problems:  how  the  round-off  error  propagates  in  time  and  the  implementation  in  closed 
loop  adaptive  control.  We  will  compare  the  implementation  of  fixed  point  arithmetic 
with  the  implementation  of  floating  point  arithmetic.  This  is  primarily  a  theoretical  in- 
vestigation to  be  conducted  with  computer  simulations  where  numerical  results  will  be 
investigated. 
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I.     INTRODUCTION 

A.     BACKGROUND 

When  we  design  control  systems  we  are  faced  with  the  problem  of  identifying  the 
plant  to  be  controlled.  In  particular,  we  try  to  determine  the  parameters  of  a  math- 
ematical model  (i.e.,  a  linear  differential  or  difference  equation)  which  best  fits  the 
input-output  data  of  the  given  plant. 

In  some  cases,  insight  of  a  model  may  be  obtained  from  the  laws  of  Physics, 
Chemistry,  etc.  In  most  cases,  this  may  not  be  possible  due  to  the  complexity  of  the 
physical  factors  involved.  In  these  instances,  it  may  be  possible  to  derive  the  values  of 
the  parameters  by  observing  the  nature  of  the  system's  response  under  appropriate  ex- 
perimental conditions.   This  procedure  is  called  "parameter  estimation". 

The  problem  of  adaptively  controlling  systems  with  uncertain  characteristics  de- 
pends on  identification  of  the  unknown  system  parameters.  In  these  cases,  the  parame- 
ters of  the  controller  arc  computed  on  the  basis  of  the  current  estimate  of  the  dynamics. 
Figure  1  shows  a  general  structure  of  an  adaptive  control  system. 

The  purpose  of  parameter  estimation  is  to  best  fit  a  proper  model  to  the  input- 
output  data  of  the  system  under  investigation.    The  main  issues  are  the  following: 

1.  Select  an  appropriate  class  of  models,  and 

2.  Select,  an  appropriate  estimation  algorithm. 

for  the  particular  case  of  estimating  the  parameters  of  linear  models,  we  search 
within  the  class  of  models  with  a  given  fixed  order  for  the  one  which  minimizes  a  pre- 
diction error  criterion. 

Firstly,  suppose  we  wish  to  select  a  model  of  a  system  based  on  input-output 
measurements,  in  the  form 

y(r)  =  <pTlt)0  +  v(t)  (1.1) 

where  y(t)  and  <pT{t)  are  determined  on  the  basis  of  the  output  and  input  signals  and  0 
is  an  array  of  unknown  parameters  to  be  determinated.  The  term  v(t)  represents  noise 
or  other  modeling  errors.  Further  information  about  this  equation  will  be  given  in  the 
next  chapter.  This  class  of  linear  models  is  preferable  because  of  its  simplicity  and  the 
amount  of  theory  developed  to  analyze  them. 


Secondly,  we  need  to  select  an  appropriate  estimation  algorithm.  Although  a  wide 
choice  exists,  the  most  efTective  in  terms  of  speed  of  convergence  and  accuracy  is  the 
recursive  least  squares  algorithm  [Ref.  1]. 
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Figure   1.      The  Adaptive  Control  System. 


The  origins  of  the  least-squares  method  can  be  traced  back  to  Gauss  as  in  [Ref.  2]. 
In  its  recursive  version  it  has  been  formulated  by  several  authors.  The  major  drawback 
of  recursive  least-squares  identification  is  its  cost  in  terms  of  complexity,  which  might 
make  it  unsuitable  for  real  time  identification  of  a  large  number  of  parameters,  since  the 
size  of  the  matrices  involved  grow  with  the  complexity  of  the  the  system  to  be  estimated. 
Although  available  microprocessors  are  efTective  for  low  order  systems  and  slow  sampl- 
ing rates,  more  complex  problems  require  improved  capabilities. 

In  this  thesis  we  address  the  problem  of  implementing  recursive  least-squares  iden- 
tification using  parallel  processing  techniques  and  systolic  arrays.  Particularly,  the  pa- 
rameter estimates  are  determined  by  the  least-squares  solution  of  a  redundant  number 
of  linear  equations  obtained  from  the  measured  data.  The  techniques  of  solutions  for 
this  class  of  equations  are  based  on  QR  decomposition,  discussed  in  the  next  chapter. 


B.     SYSTOLIC  ARRAY 

The  idea  of  systolic  array  was  developed  by  Kung  and  associates  [Ref.  3]  at 
Carnegie-Mellon  University,  and  many  versions  of  systolic  processors  are  being  designed 
by  universities  and  industrial  organizations.  This  subsection  reviews  the  basic  principle 
of  systolic  architectures  and  explains  why  they  should  result  in  cost-effective  high  per- 
formance, special  purpose  systems  for  a  wide  range  of  potential  applications. 

A  systolic  array  consists  of  a  set  of  interconnected  cells,  each  capable  of  performing 
simple  operations.  Since  simple,  regular  communication  and  control  structures  have 
substantial  advantages  over  complicated  ones  in  design  and  implementation,  the  cells  in 
a  systolic  system  are  typically  interconnected  with  the  immediate  neighbors.  In  the 
systolic  array  we  are  considering  there  are  two  kinds  of  cells  (boundary  cell,  internal 
cell).  Each  cell  in  the  array  is  provided  with  local  memory  of  its  own,  and  it  is  connected 
only  to  its  nearest  neighbors.  The  array  is  designed  such  that  regular  streams  of  data 
are  clocked  through  it  in  a  highly  rhythmic  fashion,  much  like  the  pumping  action  of  the 
human  heart;  hence  the  name  "systolic".  Information  in  a  systolic  system  flows  between 
cells  in  a  pipelined  fashion,  and  communication  with  the  outside  world  occurs  only  at 
the  boundary  cells. 

Basic  principle  of  a  systolic  array  is  illustrated  in  Figure  2.  Suppose  each  processing 
element  in  Figure  2  operates  with  a  clock  period  of  100  ns.  The  conventional  memory 
and  processor  organization  in  Figure  2a  has  5  million  operation  per  second.  With  same 
clock  rate,  the  systolic  array  will  result  in  30  million  operation  per  second  performance 
provided  the  processing  elements  operate  in  parallel  on  pipelined  data.  The  gain  in 
processing  speed  has  been  increased  six  times.  Being  able  to  use  each  input  data  item  a 
number  of  times  is  just  one  of  the  many  advantages  of  the  systolic  approach.  Other 
advantages  include  modular  expandability,  simple  and  regular  data  and  control  flows. 
use  of  simple  and  uniform  cells  and  fast  response  time. 

Previous  authors  have  presented  parameter  estimation  algorithms  using  systolic 
arrays.   The  general  idea  has  been  to  solve  a  system  of  linear  equations  in  two  stages: 

1.  Triangularization  of  the  matrix  of  coefficients, 

2.  Solving  by  successive  substition. 

As  explained  in  Chapter  III  the  previous  algorithms  used  a  triangular  systolic  array 
to  triangularize  the  matrix,  and  a  linear  systolic  array  configuration  to  solve  for  the  pa- 
rameters. The  linear  section  requires  operations  such  as  divisions  which  are  hard  to 
implement  by  simple  processor    operations.      Because  of  this,  a    new    algorithm  was 


developed.  In  our  implementation,  a  second  identical  triangular  array  has  been  used 
instead  of  the  linear  systolic  array.  It  is  characterized  by  the  fact  that  only  orthogonal 
operations  are  involved,  making  the  algorithm  numerically  more  stable  and  easily 
implementable  by  simple  shift  and  add  operations.  Also  this  algorithm  needs  two  dif- 
ferent cells  (boundary  and  internal);  previous  algorithms  needed  four  different  cells  (two 
for  a  triangular  array,  two  for  a  linear  array).  Although  more  total  cells  are  required  in 
this  implementation,  the  cost  of  additional  cells  in  a  VLSI  scheme  is  considered  to  be 
minimal. 
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Figure  2.      The  Concept  of  Systolic  Processor  Array. 


In  implementations  based  on  fixed  point  arithmetics,  vector  rotations  necessary  for 
the  QR  factorization  can  be  implemented  by  a  CORDIC  algorithm,  based  on  simple 
shift,  add  operations. 

The  use  of  fixed  point  versus  floating  point  arithmetic  is  considered  during  this  in- 
vestigation. Because  fixed  point  operations  are  based  on  simple  shift  functions  and  finite 
registers,  which  are  simple  to  implement,  it  seems  advantageous  to  use  fixed  point  val- 
ues. However,  since  input  and  output  data  do  not  naturally  appear  as  integer  values, 
there  is  concern  over  loss  of  accuracy  due  to  necessary  scaling  and  truncation. 


This  research  report  is  divided  as  follows:  Chapter  II  discusses  the  methods  of  sys- 
tem identification,  i.e..  solution  of  systems  of  linear  equations,  QR  decomposition,  and 
recursive  least  squares  algorithm,  block  processing  and  covariance  resetting.  Chapter 
III  discusses  the  Givens  rotation,  the  CORDIC  technique  and  implementation  of  the 
systolic  arrays.  Chapter  IV  presents  the  simulation  results,  and  Chapter  V  shows  the 
final  conclusions.  A  listing  of  the  computer  program  is  used  to  simulate  the  systolic 
arrays  is  found  in  the  Appendix  B. 


II.     MODELS  FOR  SYSTEM  IDENTIFICATION 

A.     LINEAR  SYSTEM  MODELING 

Suppose  we  wish  to  identify  a  model  of  a  system  based  on  input-output  measure- 
ments. As  shown  in  Figure  3  consider  a  system  with  a  single  input  u(t)  and  single  output 

y(t). 


Figure  3.      Simple  Linear  System. 


If  we  consider  a  linear  difference  equation  to  model  the  dynamics,  we  can  write 

y(t)  +  axy{t  -  1)  +  ....  +  a„y(t  -  n)  =  bxu{t  -  1)  +  ....  +  bmu(t  -  in)  +  v{t)  (2.1) 

or,  equivalently 

y(t)  =  -auv(i  -  1)  -  ....  -  a„y(t  -n)  +  bxu{t  -  1)  +  ....  +  bmu{t  -  m)  +  v(t)  (2.2) 

where  v(t)  shows  noise  or  other  modeling  errors,  a,'s  and  b,  's  are  real  constants,  and  the 
equation  is  of  nth  order.  Since  the  output  depends  not  only  on  the  input  but  on  the 
previous  output  values,  this  discrete  system  is  known  as  recursive  system.  Equation  (2.2) 
can  be  written  in  the  matrix  form 


y(t)  =  [au...,an,bl,...,bm] 


-y{t 

-i) 

-y(t 

-n) 

u(t- 

•1) 

U(t- 

m) 

(2.3) 


where  6  represents  the  parameter  vector 

0T=[a,.....an.b-t b„J 

and  ip([)  represents  the  regression  vector. 


V(r)- 


By  the  above  definitions  we  can  write  Equation  (2.1)  as 


(2.4) 


-A 

t-l) 

-A 

l-n) 

U(l 

-1) 

u(t 

-m) 

(2.5) 


(2.6) 


The  problem  now  is  to  estimate  the  parameter  vector  0  from  measurements  of  the 
sequences  y(t)  and  <p{i).  Normally,  if  the  number  of  samples  of  u(t)  and  y(t)  (i.e..  the 
number  of  equations)  equals  the  number  of  unknowns  (m+n),  we  can  solve  exactly  for 
0.  However,  the  number  of  equations  is  usually  greater  than  the  number  of  unknowns, 
since  we  tend  to  collect  a  large  amount  of  data.  For  this  reason,  this  system  of  equations 
in  general  does  not  have  a  solution.  Ideally,  in  the  noiseless  case  (v(t)  =  0)  a  solution 
exists,  regardless  of  the  number  of  equations.  However,  since  noise  and  numerical  errors 
are  present,  we  look  for  the  least  squares  solution  of  the  system  of  equations,  i.e..  for  the 
solution  which  minimizes  the  error. 

If  we  take  N  samples  of  Equation  (2.6)  and  do  not  consider  noise  or  other  modeling 
errors  we  can  write  the  system  of  N  equations  in  matrix  form  as 


v(0 

At -l) 

_ 

>•('-• V) 

<p  (') 

</(/-!) 


tp'(i-X) 


(2.7) 


or.  equivalently 


b  =  AQ  (2.S) 

where  A  e  &****,  6  e  01s,  b  e  ^?Af .  When  N  =  M  and  A  is  full  rank  and  invertible,  we  can 
solve  uniquely  for  6  as 

6  =  A~]b  (2.9) 

However,  in  signal  processing  applications  we  often  face  the  case  of  M  >  N  (i.e.,  more 
equations  than  unknowns),  and  the  solution  of  (2.9)  is  defined  in  the  least  square  sense 
by  minimization  of  the  error 

c(0)  =  \\A6-b\\2  (2.10) 

Therefore,  the  least  squares  solution  6S  of  (2.7)  is  implicitely  defined  as 

ll^-idl^rninj^-rf  (2.11) 

The  least  squares  solution  always  exists,  althought  it  might  not  be  unique.  We  can 
solve  Equation  (2.11)  by  pseudoinverse  which  however,  involves  matrix  inversion.  An- 
other method  is  to  triangularize  A  in  Equation  (2.8)  and  solve  for  6  by  successive  back 
substitutions. 

When  M  >  N  we  meet  the  dilemma  of  triangularizing  an  array.  The  solution  is  to 
triangularize  the  upper  part  of  the  A  matrix,  leaving  one  or  more  rows  of  zeros  at  the 
bottom  as  a  result.  This  will  allow  us  to  solve  for  0  in  the  least  square  sense  as  indicated 
in  Equation  (2.1 1).   This  is  known  as  QR  decomposition. 

B.     SOLUTION  OF  THE  LEAST-SQUARES  PROBLEM  USING  QR 
DECOMPOSITION 

A  numerically  attractive  method  for  determing  the  least  squares  solution  of  a  sys- 
tem of  linear  equations  is  provided  by  the  QR  decomposition  of  a  matrix  [Ref.  4]. 
Consider  again  Equation  (2.8)  with  M  >  N.  It  can  be  shown  that  we  can  always 
factorize  A  as 

A  =  QR  (2.12) 

where  A  is  a  M  x  N  matrix,  Q  is  an  M  x  M  orthogonal  matrix  such  that  QTQ  =  /  and 
R  is  defined  as 


R  = 


(2.13) 


0 


where  R  is  an  N  x  N  upper  triangular  matrix  and  0  represents  null  matrix.  It  is  not  re- 
strictive to  assume  the  diagonal  entries  of  R  to  be  non  negative.  Now  we  write  again 
Equation  (2.9) 

AS  =  b 

with  the  equality  intended  in  least  squares  sense,  and  multiply  both  sides  by  QT 

QTQR6  =  QTb 

since  QrQ  =  I 

R0  =  OTb 

and,  therefore, 

E 

0 
It  is  now  easy  to  see  that  the  error  c(0)  is  given  by 

lAe-ht  =  \\Re-Pit  +  W2t  (2.i5) 

which  is  minimal  when 

R6  =  Pl  (2.16) 

since  fi2  is  independent  of  6.  In  particular,  if  A  is  full  rank  and  M  >  N,  then  the  solution 
is  unique.  R  is  mvertible  and  we  can  compute  the  least  squares  solution  of  (2.7)  and  (2.8) 


:QTt  = 


\h 


(2.14) 


es  =  R-lp] 


Now  the  solution  of  Equation  (2.8)  in  the  least  square  sense  can  be  computed  in 
the  same  manner  as  the  solution  of  a  system  of  N  equations  and  N  unknowns  in 
Equation  (2.16). 

There  is  a  number  of  methods  available  that  can  be  used  to  compute  the  upper 
triangular  matrix  R  (e.g.,  Householder  transformation,  Gram-Schmidt  procedure,  Giv- 
ens  rotation).  We  will  concentrate  on  Givens  rotation.  The  Givens  rotation  is  an  at- 
tractive method  for  systolic  array  implementation  since  it  successively  manipulates  two 
adjacent  rows  of  the  matrix  at  a  time.  CORDIC  techniques  can  be  used  to  implement 
these  rotations  and  are  discussed  in  the  next  chapter. 

C.     RECURSIVE  LEAST  SQUARES  ALGORITHM 

It  is  possible  to  estimate  the  values  of  the  model  parameters  0  by  observing  the 
nature  of  the  system's  response  under  appropriorite  experimental  conditions.  The  idea 
for  the  recursive  identification  problem  is  to  compare  the  output  with  that  of  an  adjust- 
able model,  and  update  the  parameters  until  the  error  between  the  outputs  of  the  model 
and  the  system  is  minimized  as  in  Fisure  4. 
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Figure  4.      The  Minimization  of  Error  in  the  System. 

As  seen  in  the  previous  section,  assumption  of  a  linear  time  invariant  model  leads  to  the 
relation 


y(t)  =  ipT(t)0 


(2.17) 


with  0  the  parameters  to  be  estimated,  and  <p(t)  a  measurable  sequence.  A  recursive  least 
squares  procedure  can  be  determined  by  defining  6,  the  estimate  of  0  at  time  t.  as  the 
vector  which  minimizes 


£;(0)  =  Zl-v(T)-^r(T)0|2 


(2.18) 


Particularly,  0:  is  the  least  squares  solution  of  the  equations 


q'U-V) 


oti0) 


>•('-!) 


vi  0 1 


-        A(t-l)Ot=y(t-l) 


(2.19) 


I:  is  possible  to  show  that  6.  can  be  recursively  computed  as  [Ref.  2] 
-fl  o  : 


0...  =0.-  P:i-\) 


1  -  <y  i.'iPu-  1)^1  n 


where  P[i)  =  {A'\nA{nr-  satisfies  the  recursion 
P  :-  [)<p{t)<pT{t)P(t -  It 


;■=/■-;-- 


1  -  tp  (l)P{t-  1  tp  : 


(2.20) 


(2.21) 


So  far.  we  have  neglected  the  problem  of  existence  of  the  solution.  The  matrix  A(t) 
can  be  singular,  and  then  Pit)  might  not  exist.  Furthermore,  we  can  not  initialize  A(-l ) 
.  use,  in  this  case  this  would  yield  P(-l)  undefined,  and  we  would  not  know  how 
to  start  the  sequence  P(t).  A  solution  to  this  problem  is  to  incorporate  initial  conditions 
into  Alt)  so  that  P(t)  can  be  computed  at  each  t.  Therefore,  we  modify  the  definition 
of  6,  in  Equation  (2.19),  so  that  initial  conditions  of  the  matrix  P(t)  can  be  accounted. 
Let  A{  -1)  be  any  arbitrary  matrix  (it  must  be  full  rank).  For  convenience  let  A(—  1) 
be  square,  with  det  {A{  -l))±  0,  and  define  P(-l)=  {AT{  -\)A{  -l))_l.  This  can  be  done 
by  defining  a  new  error  criterion 


tUB) 


-  £| y(r)  -  0rnKT)|2+  (B  -  60)T(AT(-l)A(-\))-\d  -  0O) 


(2.22) 


which  takes  the  initial  estimate  0O  into  account.      This  new  formulation  leads    to  the 
matrix  A(t)  as 


4>\t-\) 


A{f)  = 


(2.23) 


0r(O) 
A{-\) 


We  choose  A(-l)  =  aj,  where  o0  >  0  is  some  arbitrary  constant,  and  I  the  identity 
matrix  of  appropriate  dimensions.  In  this  way  the  matrix  P(t)  is  always  defined  since 
AT{t)A{t)  is  never  singular.  Then,  by  algebraic  manipulations,  Equation  (2.19)  can  be 
written  as 


(2.24) 


0r(r-l) 

M'-l) 

</>r(0) 

#/  = 

M 

A(-l) 

A(-\)B0 

The  solution  to  (2.24)  is  computed  recursively  using  (2.17)  and  (2.18)  and  the  appropri- 
ate initial  conditions. 

D.     BLOCK  PROCESSING  AND  COVARIANCE  RESETTING 

From  the  previous  considerations,  the  algorithm  was  described  that  allows  us  to  es- 
timate the  parameter  6  recursively.  By  using  Equation  (2.20)  and  P(t)  =  [A(t)T A(t))-^  we 
obtain 


P(t)  =  (A  T(t)A(t)T]  =  (a20I  +  ^(^(/>r(/))- 


(2.25) 


In  general,  the  term  Z<P(')<Pr(0  is  likely  to  grow  with  time,  so  that  P(t)  ->  0  as  t  -»  oc 
[Ref.  2].    Therefore,  the  algorithm  loses  sensitivity  as  t  increases,  and  later  values  of  t 


may  not  be  as  accurate  as  earlier  values,  especially  if  our  model  changes  with  time. 
There  are  two  possible  solutions  to  this  problem: 

1.  The  use  of  a  "forgetting  factor",  and 

2.  The  covariance  resetting  approach. 

By  the  forgetting  factor  approach  we  minimize  the  error 


IKOIP 


(2.26) 


where  0  <  /  <  1  is  the  forgetting  factor.    This  has  the  effect  of  assigning  a  higher  weight 
to  more  recent  data. 

The  covariance  resetting  approach,  on  the  other  hand,  divides  the  time  scale  into 
segments  of  equal  and  fixed  length  N  as  in  Figure  5.     At  the  end  of  each  time  block, 
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kN 
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Solve  for  9 


Figure  5.      Covariance  Resetting  Approach. 


we  reset  the  covariance  matrix  P(t).    Although  it  can  be  reset  at  any  time,  for  conven- 
ience we  choose  the  end  of  each  block,  and  P(t)  now  becomes 


P(t)  = 


-1   T 


t  =  kS-\       k  =  0,1,2,., 


Equal ion(2. .21)         otherwise 


(2.27) 


In  particular,  at  the  beginning  of  each  interval  the  systolic  array  is  initialized  as  aaI,  and 
at  the  end  of  the  interval  (at  kN  +  N-1  for  the  kth  interval)  the  data  are  exited  from  the 
array  and  the  system  is  solved  to  obtain  0<*_1)A-.  This  estimate  is  used  as  an  initial  con- 
dition for  the  next  time  block. 

In  an  adaptive  control  context  it  has  been  shown  [Ref.  5]  that  an  external  input  u(t), 
sufficiently  rich  in  frequency  (n  sinusoids),  together  with  blocks  I*  of  sufficent  length  N 
provides  a  guarantee  for  a  consistent  estimation   as 


where  0*  represents  actual  parameters.   The  effect  of  various  lengths  of  N  will  be  inves- 
tigated later. 

If  we  apply  the  considerations  given  to  the  general  case,  we  can  see  that  the  pa- 
rameter estimates  at  the  end  of  each  time  block  are  related  by  the  equation 


4>\(k  +  \)X-\) 
4>T(kX) 

a  J 


y{(k+  1);V-  1) 


y(kX) 


°oekx 


(2.28) 


The  systolic  array  implementation  is  based  on  this  equation.  In  particular.  0{^l)N 
is  computed  from  Equation  (2.28)  at  the  end  of  each  time  block  by  making  the  leftmost 
matrix  upper  triangular. 

Although,  the  algorithm  presented  in  this  Chapter  assumes  a  single  input,  single 
output  (SISO)  plant,  it  can  be  extended  to  the  multiple  input-output  (MIMO)  plant 
[Ref.  6].  Additionally,  we  assume  the  plant  to  be  causal  and  of  known  order. 


III.      SYSTOLIC  ARRAY  IMPLEMENTATION 

Now.  the  problem  is  to  compute  the  solution  of  Equation  (2.28)  using  systolic  ar- 
rays. In  particular.  Equation  (2.28)  is  appropriate  for  parallel  implementation.  As  de- 
scribed in  [Ref.  5],  the  identification  problem  can  be  constructed  again  into  a  set  of  linear 
equations  as 

where  R,  is  in  upper  triangular  form.  We  can  compute  0{A_1)A-  by  using  two  processors 
in  cascade:  one  to  compute  R,  and  /?>,  ,  and  the  second  to  compute  0  from  Equation 
(3.1).  Notice  that  Equation  (3.1)  does  not  require  any  explicit  matrix  inversion  since 
R,  is  in  upper  triangular  form. 

As  seen  in  Equation  (2.2S).  we  initialize  the  array  at  the  beginning  of  each  time 
block  such  that 

R0  =  o0I 

h  =  °o^k\ 

This  has  the  effect  of  initializing  the  Rt  matrix  in  (3.1)  to  an  upper  triangular  form. 
Then,  at  each  discrete  time  t,  an  array  of  data  </?(/)  and  y(t)  will  be  passed  to  the  systolic 
array.  The  task  of  the  array  is  to  re-triangularize  the  data  at  each  clock  pulse,  so  the 
matrix  remains  in  the  form  prescribed  by  (3.1).  It  is  then  a  simple  matter  to  solve  for 
0(-n.v  •  This  technique  is  based  on  QR  decomposition  as  the  means  to  triangularize  the 
data  array. 

The  value  of  c0  relates  to  the  confidence  we  have  in  the  initial  estimates  of  0O.  As 
seen  in  Equation  (2.26)  a  larger  value  of  a0  .  will  increase  the  confidence  on  the  initial 
estimate  of  0O.   Equations  (2.24-2.28)  show  the  role  that  a0  plays. 

A.      GIVENS  ROTATION 

The  orthogonal  triangularization  process  may  be  carried  out  by  using  various 
techniques.  One  of  the  techniques  is  the  Gram-Schmidt  orthogonalization  procedure 
that  provides  the  mathematical  basis  for  the  pipelined  lattice  predictor.  Another  pow- 
erful technique  of  triangularization  by    QR  decomposition  is    provided  by  the  Givens 


rotation.  The  reason  for  this  choice  is  the  fact  that  the  Givens  rotation  operates  on  pairs 
of  adjacent  rows,  making  it  suitable  for  systolic  array  implementation.  The  object  is  to 
combine  two  adjacent  rows  in  the  matrix,  forcing  zeros  in  the  appropriate  positions  so 
to  obtain  an  upper  triangular  data  matrix.  Figure  6  shows  an  example  of 
triangularization  by  successive  Givens  rotations. 
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Figure  6.      Example  of  Row  Operations. 


Basic  idea:  Consider  anv  two  rows  of  data  as 


ak,2 
ak+l,2 


ak,n 
ak+\,n\ 


We  can  see  these  two  rows  as  a  sequence  of  vectors  in  Figure  7. 

If  we  rotate  all  these  vectors  by  an  appropriate  angle  a  then  aA+u  -*  0  since  the 
leftmost  vector  becomes  parallel  to  the  horizontal  axis.  This  is  accomplished  by  the 
following  operation 


cos  a     sm  a 


aK 


<*k,2 


■  sm  a     cos  a      ah_ 


ak,> 


The  value  of  the  angle  a.  can  be  determined  using  simple  trigonometry 
ak,\  .  ai<+\,\ 


I    2     7      2 

V  ak,\  +  ak- 


rr~~2 

V ak,\  +  ak+\,\ 


(3.2) 


This  same  rotation  a  is  then  applied  to  the  remaining  (x,y)  vectors  in  the  affected  rows 
(i.e.,  rows  k+  1,  k)  to  ensure  consistency.  Next,  the  sequence  of  rotations  is  repeated  for 
the  remaining  pairs  of  rows  (i.e.,  rows  k  +  n,  k+n-1;  k+n-1,  k+n-2;....;  k+2,  k+ 1)  in 
order  to  force  zeros  in  the  correct  locations  that  leave  an  upper  triangular  matrix. 
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Figure  7.      Example  of  Vector  Operations. 


We  remember  that  the  data  matrix  is  initialized  to  an  upper  triangular  form  (<70I  ). 
Then,  as  we  take  samples  from  the  signals  in  the  plant,  new  values  of  <p  (t)  and  y(t)  are 
added  to  the  matrix.  By  a  sequence  of  Givcns  rotations,  we  can  transform  it  into  an 
upper  triangular  matrix.  An  example  set  of  rotations  is  shown  in  Figure  S,  where  the 
operations  performed  by  the  systolic  array  at  each  clock  pulse  are  illustrated.  This  is 
repeated  until  t  =  kN  (i.e..  at  the  end  of  the  time  block),  at  which  time  the  parameters 
6  are  solved  for,  the  matrix  is  reinitialized,  and  the  process  is  repeated. 

Basis  of  the  Givens  rotation  is  the  matrix 


QiP4)  - 


h 

0 

0 

0 

r{p,q) 

0 

0 

0 

h 

(3.3) 


associated  to  each  pair  of  indexes  p,  q  e  (l,n+  1),  with  /,  and  I2  identity  matrices  of  di- 
mensions (q-2)  x  (q-2)  and  (n+  1-q)  x  (n+  1-q)  respectively,  and  r  is  a  2  x  2  matrix  of 
the  form 


r{p,q)  ■- 


c(j\q)       s{p,q)\ 
-s(p.q)       c[p,q)  | 


(3.4) 


The  matrix  Q{p,q)  is  an  orthogonal  matrix  and  has  the  property  that  it  afTects  only  two 
rows  at  a  time,  i.e.,  rows  q  and  q-1.  Application  of  the  transformation  Q{p,q)  to  any 
matrix  A  of  appropriate  dimensions  implies 
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where  x  indicates  other  elements  of  the  matrix  and  z  indicates 


-\R,- 


(3.5) 
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Figure  8.      Example  of  Givens  Rotation. 


In  Equation  (3.4)  c(p,q)  and  s(p,q)  are  provided  such  that 
c(p,q)aq-\j,  +  s(p,q)aqj,  =  z 

c{p,q)aqtP  -  s(p,q) a    x    =  0 


(3.6) 


In  Figure  8,  an  example  of  application  of  the  Givens  rotation  Q(3,4)Q(2.3)Q(1,2)  to  the 
left  matrix  results  in  the  rotated  matrix  shown  on  the  right  [Ref.  7].  As  seen  before  the 
Givens  rotation  requires  addition,  substraction,  multiplication,  division,  and  squares. 
Figure  9  illustrates  the  operations  of  each  of  the  cells.  Next,  we  will  see  how  to  perform 
the  rotations,  by  using  add  and  shift  operations  only  in  the  CORDIC  technique  algo- 
rithm. 
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Figure  9.      Givens  Rotation  Algorithm. 


B.      THE  CORDIC  TECHNIQUE 

Here  is  a  simple,  accurate  and  relatively  fast  method  for  generating  trigonometric 
functions.  The  technique  proposed  is  based  on  the  ingenious  coordinate  rotation  digital 
computer  (CORDIC)  technique  developed  by  J.  E.  Voider  [Ref.  Sj. 

The  principle  involved  in  CORDIC  is  to  rotate  a  vector,  represented  by  its  com- 
ponents X  and  Y,  through  a  set  of  successive  elementary  rotations  as  seen  in  Figure  10. 
Each  single  rotation  of  the  vector  (X,Y)  is  computed  at  each  step  by  a  combination  of 
simple  add,  subtract,  and  shift  operations. 
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Figure   10.      Example  of  CORDIC  Rotation. 

Consider  a  vector  R  represented  by  its  components  X  and  Y  and  rotate  it  through 
an  angle  ±  6.  The  results  are  the  rotated  coordinates  X'  and  V  ,  which  are  related  to  X 
and  Y  by  the  following  equations 


X'  =  X  cos  6  ±  Y  sin  6 
Y'  =  Ycosd  +  XsmO 


(3.7) 


where  the  top  sign  refers  to  clockwise  rotation. 

The  CORDIC  principle  is  based  on  performing  a  vector  rotation  in  a  sequence  of 
angular  steps  a,  such  that  the  sum  of  them  equals  to  6,  that  is 


'  =  «n  +  a,  ±  a, 


■±«„ 


(3.8) 


If  we  define  y  =  ±  1  ,  then  we  can  express  d  as 

The  problem  is  to  choose  the  value  of  a,  such  that  the  equations  above  can  be  imple- 
mented with  simple  add  and  shift  operations.  This  is  possible  if  the  values  are  chosen 
such  that 

ot,  =  tan~1(2"/)      i  =  0,1,2..../i  (3.10) 

Dividing  both  sides  of  Equation  (3.7)  by  cos  d,  gives 

cos  Of  '  '        '+i 

y.  (3,11) 
'—=  Yi±Xttandt=  YM 

COS  dj  '  '  '  H-l 

The  factors  A"  J  cos  0,  and  }",/  cos  6,  are  the  rotated  components  of  the  vector  (X,Y).  The 
new  vector  is  not  only  rotated,  but  also  scaled  by  a  factor  1/  cos  6,  .  If  6  is  now  replaced 
by  a,  =  tan_,(2"0  the  new  equations 

A',+  1  =  A)  ±  Yt2~l  =  A)  +  yjYF4  =  KtX' 

(3.12) 

The  terms  A'  and  }'  indicate  the  components  of  the  initial  vector  R.  The  term  A'  is  the 
factor  l/cos0,  by  which  A'.,  and  YM  are  larger  than  the  components  A"  and  Y  of  a 
vector  which  is  only  rotated. 

Rotations  by  angles  smaller  than  90°,  the  index  (i)  starts  with  0,1,2,....  and  contin- 
ues to  n.  The  resulting  values  for  2~'  ,  a,  and  K,  are  given  in  Table  1.  During  each  ro- 
tation the  magnitude  of  R  increases  by  K,=  1/  cos  a,  .  After  n  rotation  steps  the  value 
of  A'„  becomes 

"  =  1  1    '  =  cos20  "  cos  a,    "  cos  a2  '"  cos  a„  (3.13) 


The  value  of  Kn  thus  increases  with  each  rotation  step  and  approaches  1.6468  for  high 

values  of  i. 


Table  1. 

THE  BINARY  CORDIC  CONSTANTS. 

i 

2_1 

ex.  (in  degrees) 

Ki 

0 

1.0000000000 

45.00000000 

1.414213500 

1 

0.5000000000 

26.56505124 

1.581138826 

2 

0.2500000000 

14.03624340 

1.629800596 

3 

0.1250000000 

7.12501632 

1.642484060 

4 

0.0625000000 

3.57633432 

1.645688908 

5 

0.0312500000 

1.78991064 

1.646492240 

6 

0.0156250000 

0.89517384 

1.646639215 

7 

0.0078125000 

0.44761428 

1.646743467 

8 

0.0039062500 

0.22381056 

1.646756030 

9 

0.0019531250 

0.11190564 

1.646759170 

1  0 

0.0009765625 

0.05595300 

1.646759954 

Since  we  assume  the  number  of  rotations  to  be  constant,  K„  is  also  constant.  And 
to  correct  for  the  encrease  of  the  magnitude  of  R,  the  resulting  X„  and  }'„  values  must 
be  divided  by  Kn  after  the  rotation  operation  is  completed.  Figure  1 1  illustrates  a  series 
of  rotations  for  an  arbitrary  vector  which  we  desire  to  rotate  by  40°  .  Notice  that  in  the 
figure,  the  desired  angular  value  is  approached  quickly,  because  a,  decreases  by  approx- 
imately half  during  each  step.  The  y  sequence  in  the  figure  would  be  (  +1,  -1,  +  1, 
+  1,  +1,  -1,  -1,  -1,  +1,  -1,  +1  ). 

In  an  ideal  case  such  as  the  one  presented,  the  value  of  Y  decreases  during  each 
step.  However,  this  may  not  always  be  the  case.  For  example,  consider  the  vector  (X,Y) 
=  (5,2)  and  a  =  12.5°.  The  first  rotation  in  the  CORDIC  algorithm  would  be  -45°,  this 
gives  us  a  rotated  vector  (X,Y)  =  (4.94,-2.12)  and  a  =  -32.5°.  Notice  that  the  value 
of  |  }'|  has  actually  increased.  To  make  the  algorithm  more  elTicent,  we  modify  the  se- 
quence y  to  include  the  value  zero.    Whenever  a  rotation  causes  |  Y\  to  increase,  we  do 


Figure   11.      Set  of  CORDIC  Rotations. 


not  perform.it.  and  we  set  the  corresponding  y,  to  zero.  Then,  we  continue  on  with  the 
next  rotation  value  and  repeat  the  process.  For  the  example  just  mentioned,  the  next 
value  to  be  tried  would  be  26.6°  .  A  CORDIC  rotation  algorithm  can  be  seen  in  Figure 
12. 

C.      SYSTOLIC  ARRAYS 

In  this  section  we  will  examine  the  parallel  structure  that  will  be  used  to  solve  the 
least  squares  algorithm  described  above.  In  particular  we  aim  at  a  structure  that  accepts 
a  sequence  of  regression  vectors  <p{n)  and  signal  y(n)  as  input  and  then  outputs  the  es- 
timate for  the  parameter  0  .  Specifically,  we  are  interested  in  a  high  performance  parallel 
structure  that  can  be  implemented  directly  as  a  hardware  device  in  order  to  deliver 
maximum  throughput.  Systolic  arrays  represent  a  structure  suitable  for  these  charac- 
teristics. 


Initialize:    If    x  >  0    then    x 

=  x,  y  =  y 

else    x 
if    y  >    0    then    tf0  =  +1 

=-x,  y   =-y 

else    tf0  =  -1 

i=0 

while    i  >  N-.1    do 

if    y  j-  tfj  2    Xj    >  y ,    then 

*m    :=Xi 

Um  :=  y: 

*M     ==0 

else 

Xi-ri    :=  Xj+  tfj  2_.y  8 
yi+i    :=  yi-  ^j2  'xj 

if    y    >  0    then    ^H1  :=  +1 

else    tf^  :=  -1 

endif 

endwhile. 

Figure  12.      CORDIC  Rotation  Algol ithm. 

A  systolic  array  has  simple  and  regular  communication  and  control  structures,  and 
this  is  a  substantial  advantage  over  other  designs  and  implementations.  Additionally, 
we  want  to  allow  the  computations  to  proceed  concurrently  with  the  input,  in  order  to 
maximize  the  throughput.   This  is  known  as  pipelining  [Ref.  3]. 

Figure  13  shows  a  typical  system.  As  said  before,  the  array  is  simply  a  network  of 
processors  that  are  regularly  connected.  The  data  is  continuously  "pumped"  through 
this  structure,  thereby  minimizing  overall  execution  time,  since  all  the  processors  work 
in  parallel. 

It  was  shown  that  two  processors  would  be  necessary  to  solve  Equation  (3.1). 
Previously  used  configurations  have  consisted  of  a  triangular  array  (as  in  Figure  13)  to 
compute  the  upper  triangular  matrix,  and  a  linear  array  to  solve  the  system  of  equations. 
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Figure  13.      Systolic  Array. 

The  entire  systolic  array  is  controlled  by  a  single  clock.  Figure  14  shows  the  typical  de- 
sign for  the  case  d  (dimension)  =  3,  where  d  shows  the  number  of  columns  which  enter 
into  the  system.  In  this  thesis  we  will  use  the  alternative  configuration  in  which  the 
linear  section  is  replaced  by  a  second  triangular  section  identical  to  the  First  one.  This 
new  design  will  be  discussed  later  in  this  chapter.  Both  designs  use  a  single  clock  signal 
to  control  operations.  Before  continuing  on  to  discuss  the  alternative  design,  we  will 
review  the  structures  of  the  triangular  and  linear  sections. 
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Figure  14.      Systolic  Array  Procedure  for  Old  Design. 


1.      Triangular  Array 

The  triangular  systolic  array  performs  a  sequence  of  Givens  rotations.  The 
CORDIC  technique  is  a  possible  implementation  for  the  rotations.  The  processors  work 
simultaneously  at  each  clock  pulse.  The  data  regression  vector  (p{t)  and  output  signal 
y(t)  are  inputs  to  the  top  of  the  array,  and  rotations  are  calculated  at  each  clock  cycle. 

The  triangular  array  consists  of  two  types  of  cells:  edge  cells  (or  boundary  cells) 
and  internal  cells.  The  edge  cells  are  represented  by  the  circles  and  the  boundary  cells 
are  represented  by  the  squares  as  seen  in  the  Figure  13  and  Figure  14.  Figure  9  defines 
the  operations  of  these  cells.  The  edge  cell  computes  the  rotation  parameters  c  and  s 
as  seen  in  Equation  (3.2).  Each  cell  of  the  triangular  array  stores  an  element  of  the  up- 
per triangular  matrix  R(n)  from  Equation  (3.1).  and  it  is  initialized  to  zero  for  internal 
cells  and  to  gJ  for  the  edge  cells  as  mentioned  before.  The  function  of  each  row  of 
processing  cells  in  the  triangular  systolic  array  section  is  to  combine  one  row  of  the 
stored  triangular  matrix  with  a  vector  of  data  received  from  the  above  cells  in  such  a 
way  that  the  leading  element  of  the  received  data  vector  is  annihilated.  The  data  vector 
so  obtained  is  then  passed  downward  on  to  the  next  row  of  cells.  The  boundary  cells  in 
each  row  of  the  section  computes  the  rotation  parameters  and  then  passes  them  to  the 
right  on  the  next  clock  cycle.  The  internal  cells  apply  the  same  rotation  parameter  to 
all  other  elements  in  the  received  data  vector. 

A  delay  of  one  clock  cycle  per  cell  is  incurred  when  passing  the  rotation  pa- 
rameters along  a  row.  That  is  why  it  is  necessary  to  "skew"  the  input  data  as  seen  in 
Figure  14.  so  that  the  input  data  interacts  properly  with  the  previously  stored  triangular 
matrix.  Because  the  cells  are  operating  simultaneously,  the  data  in  the  system  at  any 
time  t  consists  of  values  from  (2n)  different  matrices.  Figure  15  demonstrates  this  for  a 
system  with  n=  3.    In  this  figure,  we  can  see  that  at  time  (t  +  5),  there  are  also  values 

present  from  the  five  previous  matrices  (i.e.,  t  +  4.t  +  3 ,t).   In  order  to  get  all  the  cells 

in  the  array  to  a  similar  time  state,  the  array  would  have  to  be  clocked  an  additional  2n-l 
(five)  cycles,  feeding  zeros  as  input  where  necessary.  At  the  completion,  all  cells  will  be 
at  the  same  time  (t  +  5)  [Ref.  7]. 

Note  that  at  the  same  time  the  triangularization  process  is  being  carried  out, 
the  column  vector  f$H  is  also  being  computed  by  the  rightmost  column  of  internal  cells 
using  y(n)  as  its  input. 
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Figure  15.      Data  Flow  in  Triangular  Section. 


At  the  end  of  the  triangularization  period  (N),  we  are  ready  to  solve  for  0(^1);Y. 
The  data  in  this  triangular  array  is  clocked  out  to  the  next  array  section  that  will  com- 
pute the  parameters. 
2.      Linear  Array 

The  linear  systolic  array  has  been  used  in  previous  implementations  to  solve  for 
the  estimated  parameters.  The  linear  section  consists  of  one  boundary  cell  and  (d-1) 
internal  cells  as  seen  in  Figure  14. 

The  operation  of  the  cells  as  they  compute  the  parameters  are  shown  in  Figure 
16.  Note  that  the  cells  are  difTerent  from  those  in  the  triangular  array,  increasing  to  four 
the  number  of  unique  cells  necessary  in  the  combined  system. 

It  is  shown  in  [Ref.  3]  that  the  time  required  to  solve  for  0  using  the  linear  array 
is  equal  to  2d.  At  the  end  of  the  period,  the  parameters  are  used  as  initial  values  for  the 
triangular  array,  and  the  triangular  section  again  begins  the  operation. 

We  now  replace  the  linear  section  with  a  second  triangular  section,  and  discuss 
the  differences  between  these  two  design  approaches. 


Figure  16.      Definition  of  Cell  Operation  for  Linear  Section. 


3.      The  Use  of  a  Second  Triangular  Array  as  the  Solution  Section 

An  alternative    implementation  can  be    obtained  by    solving  0  as  shown  in 
Fieure  17. 
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Figure  17.      Systolic  Array  Procedure  for  New  Design. 


In  this  implementation,  at  the  end  of  the  triangularization  period,  the  data  is 
passed  from  the  first  triangular  section  to  the  second  triangular  section  in  a  reversed 
position.  The  second  triangular  array  performs  the  same  type  of  operations  as  the  first, 
therefore,  the  cells  are  identical. 

The  reason  of  using  another  triangular  section  is  that,  by  proper  combinations 
of  rows,  we  can  force  zeros  into  all  elements  of  a  given  row  but  one,  so  that  we  can  solve 
for  each  of  the  parameters.  Also,  the  fact  that  orthogonal  operations  are  used  makes  it 
more  robust  in  the  presence  of  numerical  errors  [Ref.  7].  To  see  how  this  works,  consider 
an  arbitrary  set  of  equations  in  upper  triangular  form  as  in  Equation  (3.14).  Now  x3  is 
simply  solved  as  bjayy  To  solve  for  x2,  we  can  force  a23  to  zero  by  a  linear  combination 
of  rows  two  and  three.  Then  x2  is  found  to  be  b2'a:2.  Similarly,  by  row  operations  on 
row  1 .2  and  3  we  can  make  au  =  a13  =  0  in  row  one,  and  .r,  is  found  to  be  bjau.  This  type 
of  operations  are  exactly  what  the  triangular  array  was  designed  to  do. 
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Figure  IS  shows  these  operations  in  matrix  format,  and  as  mentioned  before 
data  is  in  reversed  position.  Notice  that  the  array  is  initialized  to  all  zeros  here,  whereas 
the  first  triangular  section  is  initialized  to  a0BkN  and  a0I. 

To  understand  how  the  system  operates,  recall  the  first  triangular  array  at 
time  N.  Now  we  must  feed  the  data  down  into  the  second  triangular  section  m  an 
appropriate  manner  so  that  we  can  solve  for  the  parameters.  The  same  delay  (one  clock 
cycle  per  cell)  in  propagation  of  data  applies  to  this  triangular  section  as  it  does  in  the 
first  section.  That  is  why  we  must  carefully  choose  when  to  sample  the  array  in  order 
to  get  the  correct  values  with  which  to  calculate  the  solution. 

Figure  19  illustrates  the  data  flow  for  a  simple  system  where  d  =  3.  The  input 
data  is  skewed  as  it  was  in  the  triangular  section.  It  can  be  seen  that  the  values  a33, 
ar  .  and  au  are  available  at  times  N+  1,  N+4,  and  N+7  respectively.  Similarly,  the 
values  of  b3,  b2.  and  bx  are  available  at  times  N  +  4,  N  +  6,  and  N  +  8.  In  general,  for  any 
size  system  n,  the  coefficients  a„  and  outputs  b}  are  available  as  shown  in  Table  2.  Note 
from  the  figure  that  the  coefficients  are  "picked  off'  from  the  edge  cells  at  the  appropri- 
ate times,  while  the  outputs  are  found  in  the  rightmost  set  of  internal  cells  [Ref.  3]. 
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Figure  18.      Solution  of  System  of  Equations. 


This  new  design  operates  slower  than  the  linear  system  seen  in  [Ref.  7].  The 
time  to  solve  for  0  in  this  design  is  equal  to  the  time  until  l.\  appears,  which  is  equal  to 
(3n  -  1).  This  compares  to  2n  (or  2d)  in  previous  implementations;  hence,  n-1  more 
clock  cycles  are  required. 

On  the  other  hand,  simplification  is  gained  in  the  manufacturing  process  since 
the  number  of  types  of  cells  is  reduced.  The  tradeoffs  to  be  considered  are  simplicity 
(cost)  versus  speed. 
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Figure   19.      Data  Flow  in  the  Second  Triangular  Array. 


Table  2.     TIMES  OF  AVAILABILTY  OF  DATA. 


Coefficient 

Times 

n,n 

N+1 

a 

N+4 

n-l.n-1 

3      n-2.n-2 

N+7 

a 

n-3,n-3 

N+10 

J 

: 

b 

n 

N 

+  (n  ♦  1) 

bn-1 

N 

♦  (n  +  3) 

b 

n-2,. 

N 

+  (n  +  5) 

b 

n-3 

N 

+  (n  +  7) 

IV.      SIMULATION  STUDY 

1.      Model  Equation 

In  general,  a  linear  model  can  be  expressed  by  either  the  difference  equation 
relating  the  input  and  output  sequence,  or  in  a  regression  form  as  mentioned  in  the 
previous  Chapters.  In  the  example  below,  we  will  consider  the  problem  of  estimating  the 
unknown  parameters  both  with  and  without  noise  present. 

In  our  simulation  program,  we  will  use  the  first  order  model  which  has  the 
discrete  time  transfer  function  such  that 

//.--i  =  7-^7  (4.1) 

which  corresponds  to  the  linear  difference  equation 

where  u(t)  and  y(t)  are  the  input  and  output  sequences  respectively.   Equation  (4.2)  can 
be  expressed  as 

•r- [«„*,]  (4.?) 

VT(t)  =  [y{t-\)Mt-lJ]  (4.4) 

This  corresponds  to  the  regression  model 

y(i)  =  8T<p(t)  (4.5) 

If  the  model  has  noise  we  obtain 

y(t)  =  dr<p(i)  +  v(t)  (4.6) 

For  proper  system  identification  the  input  sequence  must  contain  a  sufficient 
number  of  frequency  components.    In  our  simulating  program  we  used  the  sine  wave 

ui  i)  =  sinf  -^  )  +  sin(  -^ )  +  sin(  -^  )  +  sm(  -^  )  (4.7) 


For  identification  purposes  the  input  must  be  sufFicently  rich  in  frequencies  to 
excite  all  modes  of  the  system.  If  we  have  fourth  order  system  we  should  have  at  least 
four  different  sinusoidal  components. 

In  our  simulation  model  we  chose  the  values  of  the  parameters  as  a,  =  0.5  and 
bx  =  2.  In  the  parameter  estimation  problem,  these  values  will  be  assumed  to  be  un- 
known. 

2.  Noise  Model 

The  noise  term  v(t)  is  a  sequence  of  independent  random  variables  with  zero 
mean  values.   The  noise  is  added  to  both  the  incoming  signal  u(t)  and  measured  output 

y(t). 

3.  Choice  of  Initial  Values 

For  a  recursive  algorithm,  we  need  to  initialize  the  systolic  array  with  an  initial 
parameter  estimate  o06(0)  and  aQI  in  Equation  (2.28).  Here  o0  is  related  to  the  confidence 
of  initial  condition  of  6(0).  If  some  prior  information  about  6(0)  is  available,  it  should 
be  used  for  determining  proper  values  of  6(0).  If  no  prior  information  is  available  we 
choose  6{0)  =  0  . 

4.  Choice  of  Block  Length 

In  our  simulation  program  we  have  used  four  different  block  lengths  both 
without  noise  and  with  noise  in  order  to  test  the  convergence  rates  in  the  different  cases. 
We  chose  N  =  3.  7,  10,  15  values  for  the  block  lengths,  where  N  identifies  the  block 
sizes. 

5.  Floating  Point  and  Fixed  Point  Operations 

During  simulation,  the  use  of  floating  point  versus  fixed  point  arithmetic  is 
considered.  Fixed  point  arithmetic  operations  are  performed  using  simple  shift  oper- 
ations and  finite  registers.  Because  of  this,  they  are  simpler  to  implement  than  floating 
point  operations.  On  the  other  hand,  since  input  and  output  data  values  do  not  na- 
turally appear  as  integer  values,  there  is  some  concern  over  loss  of  accuracy.  The  sol- 
ution to  the  latter  problem  is  to  scale  all  the  numbers  so  that  they  stay  within  the  limits 
of  the  fixed  registers. 


A.      NOISELESS  MEASUREMENTS 

This  is  an  ideal  case.  Figure  20  through  27  illustrate  the  results  of  these  simulations. 
In  these  figures  the  estimated  parameters  (  0{  and  02 )  arc  plotted  along  the  vertical  axis, 
the  block  number  is  plotted  along  the  horizontal  axis.  In  the  following  we  will  discuss 
the  results  for  floating  point  and  fixed  point  arithmetic. 

1.  Results  Using  Floating  Point  Arithmetic 

Figures  20  through  23  illustrate  the  floating  point  results  for  block  lengths  of 
N  =  3,  7,  10  and  15  respectively.  As  seen  from  the  figures  the  parameters  exhibit  the 
fastest  rate  of  converge  when  N=  7.  Even  though  the  estimated  parameter  a,  converges 
very  fast  at  N"=  15,  the  other  estimated  parameter  6,  converges  slowly  in  this  block 
length.  As  it  can  seen  from  the  Table  3,  the  case  of  N  =  7  requires  minimum  number 
of  clock  cycles,  therefore  in  this  case  we  should  choose  a  block  length  of  seven. 

2.  Results  Using  Fixed  Point  Arithmetic 

This  situation  has  been  simulated  by  adding  the  random  noise  to  the  measure- 
ments and  the  computations,  so  to  account  for  round-off  errors. 

Figure  24  through  27  illustrate  the  fixed  point  results  for  the  same  conditions. 
Notice  that  the  parameters  converge  as  fast  as  floating  point  operations.  As  seen  from 
Table  4,  for  this  case  again,  N=7  requires  minimum  number  of  clock  cycles.  This  indi- 
cates that  the  degradation  due  to  the  implementation  is  not  dramatic  for  the  fixed  point 
processors.  When  we  consider  the  simplicity  of  fixed  point  processors,  this  is  a  distinct 
advantage. 

Table  3.     THE  RATE  OF  CONVERGE  FOR  FLOATING  POINT(NO  NOISE). 
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Figure  20.      Floating  Point  Operation  for  N  =  3  (No  Noise). 
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Figure  21.      Floating  Point  Operation  for  N  =  7  (No  Noise). 
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Figure  22.      Floating  Point  Operation  for  N  =  10  (No  Noise). 


Figure  23.      Floating  Point  Operation  for  N=  15  (No  Noise) 


Figure  24.      Fixed  Point  Operation  for  N  =  3  (No  Noise). 


Figure  25.      Fixed  Point  Operation  for  N  =  7  (No  Noise). 


Figure  26.      Fixed  Point  Operation  for  N=  10  (No  Noise). 
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Figure  27.      Fixed  Point  Operation  for  N=  15  (No  Noise). 


Table  4.     THE  RATE  OF  CONVERGE  FOR  FIXED 

POINT(NO  NOISE). 

BLOCK  LENGTH(N) 

BLOCK  NUMBER 

TOTAL 

AT 

CONVERGE 

CLOCK  CYCLES 

3 

30 

90 

7 

12.5 

87.5 

10 

15 

150 

15 

10.5 

157.5 

B.      NOISY  MEASUREMENTS 

As  mentioned  before  the  noise  term  v(t)  is  a  sequence  of  independent  random  var- 
iables with  zero  mean.  We  used  a  variance  of  0.5  for  these  random  variables.  As  ex- 
pected in  a  real  system,  the  noise  is  added  to  both  incoming  signal  u(t)  and  output  y(t). 

1.  Results  Using  Floating  Point  Arithmetic 

Figures  2S  through  31  show  the  results  for  the  four  different  block  lengths. 
As  seen  from  the  figures,  for  N=3  the  parameters  are  most  affected.  For  N=7  or 
N=  10,  the  parameters  are  less  affected  by  the  noise.  When  the  block  length  is  equal  to 
7,  the  parameters  have  converged  reasonably  well  within  eight  blocks  (56  cycles).  For 
N=  15,  we  see  that  the  parameters  are  least  affected  by  the  noise. 

2.  Results  Using  Fixed  Point  Arithmetic 

Figures  32  through  35  show  the  results  for  fixed  point  implementation.  Again 
we  see  that  they  exhibit  similar  performance  as  in  the  floating  point  cases.  Effects  of 
block  length  are  almost  the  same  as  described  previously. 

To  simulate  Fixed  point  behavior  we  added  random  disturbances  in  the  com- 
putations within  the  cells.  In  particular,  the  factors  c  and  s  are  affected  by  round-off 
errors  which  we  can  simulate  in  this  fashion.  A  similar  approach  has  been  taken  in  [Rcf. 
91  in  the  analysis  of  a  systolic  array  implementation  of  the  projection  operator. 
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Figure  28.      Floating  Point  Operation  for  N  =  3  (With  Noise). 
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Figure  29.      Floating  Point  Operation  for  N  =  7  (With  Noise). 
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Figure  3U.      Floating  Point  Operation  for  N=  10  (With  Noise). 
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Figure  31.      Floating  Point  Operation  for  N=  15  (With  Noise). 


Figure  32.      Fixed  Point  Operation  for  N  =  3  (With  Noise). 
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Figure  33.      Fixed  Point  Operation  for  N  =  7  (With  Noise). 
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Figure  34.      Fixed  Point  Operation  for  N  =  10  (With  Noise). 


Figure  35.      Fixed  Point  Operation  for  N=  15  (With  Noise). 


V.      CONCLUSIONS 

In  the  previous  chapters  we  explained  the  estimation  of  the  model  using  a  parallel 
structure  and  recursive  least  squares  algorithm.  In  this  chapter  we  will  discuss  the 
tradeoffs,  advantages  and  disadvantages  of  the  systems  that  we  have  investigated. 

The  recursive  algorithm,  based  on  recursive  least  squares  with  covariance  resetting, 
is  redesigned  in  order  to  make  it  suitable  for  implementation  on  a  VLSI  chip  and  also 
modify  the  systolic  array  in  order  to  reduce  computing  time  and  complexity.  The  dif- 
ference in  adaptive  control  is  that  the  estimator  has  to  operate  recursively  on  subsequent 
blocks  of  data,  so  that  the  estimated  parameters  converge  to  the  respective  correct  val- 
ues. This  convergence  is  guaranteed  by  initializing  each  estimation  with  the  parameter 
values  from  the  previous  one.  and  by  persistency  of  excitation  of  the  external  inputs. 

We  simulated  several  possible  conditions  to  see  the  response  of  the  parallel  algo- 
rithm. Almost  in  each  condition  the  parameters  converged  close  to  their  own  values, 
even  in  the  presence  of  noise. 

This  parallel  algorithm  is  different  from  other  similar  algorithms  in  the  fact  that 
we  replaced  two  different  processors  by  two  identical  ones.  There  was  a  total  of  four 
different  computing  cells  (two  for  triangular  section,  two  for  linear  section)  in  the  old 
design.  For  the  new  design  we  need  only  two  different  cells  for  the  whole  system  as  ex- 
plained in  Chapter  III.  The  new  design  requires  more  total  cells  than  the  old  design. 
For  a  fourth  order  system  (d=4),  new  design  requires  10  additional  cells  from  the  for- 
mula of  —  d(d+  1).    However,  additional  cells  are  not  expensive  in  a  VLSI  schema. 

As  described  in  Chapter  III  additional  (d-1)  clock  cycles  are  required  to  operate  the 
second  triangular  section.  That  is  why  a  second  triangular  section  is  a  little  bit  slower 
than  the  linear  section. 

In  this  thesis  we  also  compared  floating  point  operations  with  fixed  point  oper- 
ations. For  floating  point  operations  we  saw  that  the  convergence  rate  increases  with 
the  block  length.  Also  when  the  block  length  is  small,  the  identification  is  more  sensetive 
to  the  presence  of  noise. 

We  got  almost  the  same  results  for  fixed  point  operations.  If  we  consider  the 
simplicity  of  the  fixed  point  processor,  a  significant  advantage  to  use  the  fixed  point 
arithmetic  is  due  to  its  simplicity. 


APPENDIX  A.     COMPUTER  PROGRAM  I 

A.     PURPOSE  OF  THE  PROGRAM 

This  program  converts  a  given  data  matrix  to  an  upper  triangular  matrix  by  using 
Givens  rotation  algorithm.  There  are  two  classes  of  cells.  Therefore,  two  different  types 
of  subroutines  are  used.  As  explained  in  Chapter  III  this  upper  triangular  matrix  again 
convert  to  another  upper  triangular  matrix  and  meanwhile  the  unknown  parameters  are 
computed. 

All  elements  of  the  matrix  are  given  interactively.  This  program  can  solve  49  x  50 
matrix.  If  run  this  program,  should  be  extended  virtual  storage  capacity  to  1500K  in 
advance  due  to  the  larse  array. 


C     ********* VAR I AB LE  DEC LARAT I ON********* 
IMPLICIT  REAL*8  (A-H,0-Z) 

REAL*8  A,C,S,U,LU,AOUT,AA,ED,IN,AAA,CC,SS,UU,AAOUT,LLU,EED, 
+IIN,THETA 

DIMENSION  A(50,50),C(0:  50,0: 50),S(0: 50 ,0: 50) ,U( 0: 50,0: 50), 
+LU(0:  50,0:  50),  A0UT(  0:50,0:  50)  ,  AA(50  ,50)  ,ED(  0:  50  ,  0:  50)  , 
+IN(0: 50,0: 50) , AAA(50 ,50) ,CC( 0: 50,0: 50),SS(0: 50,0: 50), 
+UU(0:  50,0:50),LLU(0:50,0:50),AAOUT(0:  50,0:50), 
+EED(0:  50,0:  50)  ,IIN(0:  50,0:  50)  ,THETA(50) 
INTEGER  I,J,K,M,N 
C     ************************************** 
C     **********VARIABLE  DEFINIATTON******** 
C     A(I,J)   =  DATA  MATRIX 
C     ED(I,J)   =  FIRST  TRIANGULAR  MATRIX 
C     EED(I,J)  =  SECOND  TRIANGULAR  MATRIX 
C     THETA(K)     =  UNKNOWN  PARAMETERS 
C     ************************************** 
C     ************DatA  ENTRANCE*****'******** 
PRINT  *, 'ENTER  THE  NUMBER  OF  COLUMNS  M=' 
READ(5,*)M 

PRINT  *, 'ENTER  THE  NUMBER  OF  ROWS  N=' 
READ(5,*)N 
PRINT  *,'  ' 
DO  1  1=1, N 
DO  2  J=1,M 

WRITE(6,3)I,J 
3         FORMAT( 'ENTER  A( ' ,12,12,' )') 

READ(5,*)  A(I,J) 
2       CONTINUE 
1     CONTINUE 

PRINT  *,'DATA  MATRIX' 
DO  4  1=1, N 


WRITE(6,*)(A(I,J),J=1,M) 

4  CONTINUE 
PRINT  *,'  ' 
DO  5  1=1, N 

DO  6  J=1,M 

AA(I,J)=A(N-(I-1),J) 

6  CONTINUE 

5  CONTINUE 

DO  7  1=1,2,3 
DO  8  J=1,M 

AA(N+I,J)=0.  0 

8  CONTINUE 

7  CONTINUE 

DO  9  1=1,2,3 
DO  10  J=1,M 
AOUT(I,J)=0.  0 

10  CONTINUE 

9  CONTINUE 

DO  11  1=1, N+l 
DO  12  J=1,M 

U(I,J)=AA(I,J) 

12  CONTINUE 

11  CONTINUE 

Q  -,'c  -.'.-  -.'.-  -.'.-  Vr  -.'.-  -.V  -.':  -.'--  -,':  -':  -,V  -.':  -,'.-  -V  -,V  -,': '':  -,'r  -'-  ■;'.-  -,'r  -,'.-  -.'.-  -':  -V :':  -V  -,V  Vr  -,'r :':  "V  -,',-  VrVflVVr 

C     *********first  TRIANGULARIZATION****** 

DO  13  K=1,N 
DO  14  J  =  K,M 

DO  15  I  =  l.N-K+2 
IF  (J.EQ.K)THEN 

CALL  EDGE(U(I,J),AOUT(I,J),AOUT(I+l,J),C(I,J),S(I,J)) 
ED(I,J)=AOUT(I,J) 
ELSE 

CALL  INTERNAL(U(I,J),AOUT(I,J),C(I,J-l),S(I,J-l),AOUT(I+l,J) , 
+LU(I,J),C(I,J),S(I,J)) 
U(I-1,J)=LU(I,J) 
IN(I,J)=AOUT(I,J) 
END  IF 
15        CONTINUE 
14      CONTINUE 

13  CONTINUE 

DO  20  1=1, N 

DO  21  J=1,N+1-I 
ED(I,J)=0.0 

21  CONTINUE 
20    CONTINUE 

DO  22  J=M,2,-1 

DO  23  K=N+3-J,N+l 
DO  24  I=K,N+1 
ED(I,J)=IN(I,J) 
24        CONTINUE 
23      CONTINUE 

22  CONTINUE 

PRINT  'S' FIRST  TRIANGULAR  MATRIX* 
DO  29  I=N+1,2,-1 

WRITE(6,*)(ED(I,J),J=1,M) 
29    CONTINUE 


PRINT  *,'  ' 
C     ************daTA  ENTRANCE************* 
DO  25  1=1, N 
DO  26  J=1,M-1 

AAA(I,J)=ED(I+1,M-J) 

26  CONTINUE 
25    CONTINUE 

DO  27  J=M,M+1,2 
DO  28  1=1, N 

AAA(I,J)=ED(I+1,M) 
28      CONTINUE 

27  CONTINUE 

DO  30  1=1,2,3 
DO  31  J=1,M 

AAA(N+I,J)=0.0 

31  CONTINUE 
30     CONTINUE 

DO  32  1=1,2,3 
DO  33  J=1,M 

AAOUT(I,J)=0.  0 

33  CONTINUE 

32  CONTINUE 

DO  34  1=1, N+l 
DO  35  J=1,M 

UU(I,J)=AAA(I,J) 

35  CONTINUE 

34  CONTINUE 

C     ********  SECOND  TR I ANGULAR I Z  AT I ON****** 
DO  36  K=1,N 
DO  37  J  =  K,M 

DO  38  I  =  l,N-K+2 
IF  (J.  EQ.  K)THEN 

CALL  EDGE(UU( I , J) ,AAOUT( I , J) ,AAOUT( 1+1 , J) ,CC( I , J) ,SS( I , J) ) 
EED(I,J)=AAOUT(I,J) 
ELSE 

CALL  INTERNAL(UU(I,J),AAOUT(I,J),CC(I,J-l),SS(I,J-l), 
+AAOUT(I+l,J),LLU(I,J),CC(I,J),SS(I,J)) 
UU(I-1,J)=LLU(I,J) 
IIN(I,J)=AAOUT(I,J) 
END  IF 

38  CONTINUE 
37      CONTINUE 

THETA(K)=IIN(2,M)/EED(2,K) 

36  CONTINUE 

DO  39  1=1, N 
DO  40  J=1,M-I 
EED(I,J)=0.0 
40      CONTINUE 

39  CONTINUE 

DO  41  J=M,2,-1 

DO  42  K=M+2-J,N+l 
DO  43  I=K,N+1 

EED(I,J)=IIN(I,J) 
43        CONTINUE 


42      CONTINUE 
41    CONTINUE 

PRINT  '%' SECOND  TRIANGULAR  MATRIX' 

DO  44  I=N+1,2,-1 

WRITE(6,*)(EED(I,J),J=1,M) 

44  CONTINUE 
PRINT  *,'  ' 

PRINT  *,' UNKNOWN  PARAMETERS' 
WRITE (*  46) 
46    FORMAT(f ' ,11X,'K' , 13X, 'THETA(K) ' ) 
DO  45  K=N,1,-1 

WRITE (6,*)  N-K+1,THETA(K) 

45  CONTINUE 
STOP 
END 


C     *  A  a  a  a  x  x  a  SUBROUTINE  GROUPS  FOR******** 

C       AAAAAAAAAAAAQELLS  FUNCTION" * ""*"""**** 


************BOUNDARY  CELL************* 

SUBROUTINE  EDGE  (U, AIN, AOUT.C ,S) 

IMPLICIT  REALMS  (A-H.O-Z) 

REAL*8  U,AIN,AOUT,C,S 

IF(U.  EQ.  0.  AND.  AIN.  EQ.  0)THEN 

C=1.0 

S=0.  0 

AOUT=U 

ELSE 

:=v  sqrt(  u**2.  odo+ain**2.  odo ) ) 

S=AIN/(SQRT(U**2.  0D0+AIN**2.  ODO)) 
A0UT=SQRT(U**2.  0D0+AIN**2.  ODO) 

END  IF 

END 


*v«v*********internal  CELL***-"'-******** 

SUBROUTINE  INTERNAL  ( PU, AIN, PC ,PS , AOUT,LU,LC ,LS) 

IMPLICIT  REAL*8  (A-H,0-Z) 

RE AL*8  PS  ,  PC ,  PU ,  S  ,  C  ,  U ,  AOUT ,  LU ,  AIN ,  LC ,  LS 

LU=( -PU*PS)+(AIN*PC) 

AOUT=(PU*PC)+(AIN*PS) 

LC=PC 

LS=PS 

RETURN 

END 


C     *****XHE  RESULTS  OF  THIS  PROGRAM****** 

ENTER  THE  NUMBER  OF  COLUMNS  M= 

3 

ENTER  THE  NUMBER  OF  ROWS  N= 


ENTER  A(  11) 

1 

ENTER  A(  12) 

2 

ENTER  A(  1  3) 

3 

ENTER  A(  2  1) 

4 

ENTER  A(  2  2) 

5 

ENTER  A(  2  3) 

6 

DATA  MATRIX 

1.0000 

4.0000 

2.0000 
5.0000 

3. 
6. 

0000 
0000 

FIRST  TRIANGULAR  MATRIX 
4.1231       5.3357 
0.0000       0.7276 

6. 
1. 

5484 
4552 

SECOND  TRIANGULAR  MATRIX 
5.3851       4.0852 
0.0000       0.5570 

6. 
-0. 

6850 
5570 

UNKNOWN  PARAMETERS 
K      THETA(K) 

1  -1.0000 

2  2.0000 


ENTER  THE  NUMBER  OF  COLUMNS  M= 

4 

ENTER  THE  NUMBER  OF  ROWS  N= 


ENTER  A(  1  1) 

3 

ENTER  A(  1  2) 

7 

ENTER  A(  1  3) 

11 

ENTER  A(  1  4) 

20 

ENTER  A(  2  1) 

5 

ENTER  A(  2  2) 


ENTER  A(  2  3) 
16 


ENTER  A(  2  4) 

21 

ENTER  A(  3  1) 

6 

ENTER  A(  3  2) 

8 

ENTER  A(  3  3) 

10 

ENTER  A(  3  4) 

22 

DATA  MATRIX 

3.  0000 

7. 

0000 

5.0000 

9. 

0000 

6.  0000 

8. 

0000 

11.  0000 
16.0000 

10.0000 


20. 0000 
21.0000 
22.0000 


FIRST  TRIANGULAR  MATRIX 

8.3666      13.6256  20.6774 

0.0000       2.8884  6.6670 

0.0000       0.0000  2.2345 

SECOND  TRIANGULAR  MATRIX 

21.8403      13.7818  7.9211 

0.0000       2.0151  2.3979 

0.0000       0.0000  1.2269 

UNKNOWN  PARAMETERS 
K      THETA(K) 

1  -1.7777 

2  5.8888 

3  -1.4444 


35.4982 

7. 3792 

-3. 2276 


35.5305 

7.6038 

-2.  1812 


ENTER 

5 

ENTER 


THE  NUMBER  OF  COLUMNS  M= 
THE  NUMBER  OF  ROWS  N= 


ENTER 

9 

ENTER 

14 

ENTER 

26 

ENTER 

31 

ENTER 

42 

ENTER 

14 

ENTER 

57 


A(  1  1) 

A(  1  2) 

A(  1  3) 

AC  1  4) 

A(  1  5) 

AC  2  1) 

A(  2  2) 

AC  2  3) 


ENTER  A(  2  4) 
38 


ENTER 
3 

ENTER 
9 

ENTER 

A(  2 

5) 

A(  3 

1) 

A(  3 

2) 

21 

ENTER 

A(  3 

3) 

42 

ENTER 

A(  3 

4) 

57 

ENTER 

AC  3 

5) 

48 

ENTER 

A(  4 

1) 

12 

ENTER 

A(  4 

2) 

19 

ENTER 

A(  4 

3) 

23 

ENTER 

A(  4 

4) 

45 

ENTER 

A(  4 

5) 

58 

DATA  MATRIX 

6. 

0000 

9. 0000 

14. 0000 

26. 

0000 

31. 

0000 

42. 

0000 

14.  0000 

57.  0000 

38. 

0000 

3. 

0000 

9. 

0000 

21.  0000 

42.  0000 

57. 

0000 

48. 

0000 

12. 

0000 

19. 0000 

23.0000 

45. 

0000 

58. 

0000 

FIRST  TRIANGULAR  MATRIX 

44. 

9999 

23.5333 

69. 6000 

62. 

3333 

32. 

0000 

0. 

0000 

22.  9168 

26.4032 

58. 

9561 

73. 

2183 

0. 

0000 

0.  0000 

14.  0252 

4. 

5607 

-14. 

6452 

0. 

0000 

0.  0000 

0.  0000 

3. 

4541 

6. 

2125 

SECOND 

TRIANGULAR  MATRIX 

85. 

9883 

69. 3000 

32. 7718 

32. 

6206 

72. 

8703 

0. 

0000 

30.5859 

-0.9184 

28. 

4896 

-35. 

7980 

0. 

0000 

0.  0000 

2.  0397 

7. 

9060 

4. 

9136 

0. 

0000 

0.0000 

0.0000 

9. 

3125 

4. 

7191 

UNKNOWN  PARAMETERS 
K      THETA(K) 

1  0.5067 

2  0.4447 

3  -1.6290 

4  1.7985 


APPENDIX  B.     COMPUTER  PROGRAM  II 

A.     PURPOSE  OF  THE  PROGRAM 

This  program  computes  the  estimated  weight  vector  of  least  squares  for  first  order, 
second  order  or  third  order  model  which  has  the  discrete  time  transfer  function  such  that 

b,Z2 

H(Z)  =  — 1 

Z   +  fliZ*  -f  a2Z  +  a3 

where  the  all  coefficients  are  given  interactively.  In  this  program,  addition  to  the  previ- 
ous program  is  used  another  two  subroutines  to  supply  the  gaussian  noise  which  corre- 
sponds to  parameter  v(t).  As  seen  in  the  plots  this  program  computes  the  unknown 
parameters  by  converging.  In  order  to  run  this  program  should  be  extended  virtual 
storage  capacity  to  2M. 


********* VAR I AB LE  DEC LARAT I  ON *** »»•••'•  ** 

REAL  C,S,U,LU,AOUT,ED,IN,X,Y,SIG,UIN,THETA,V,R,BM,AAA,CC,SS,UU, 
+LLU,AAOUT,EED,IIN 

DIMENSION  C(0:50,0:50),S(0:50,0:50),U(0:50,0:50),LU(0:50,0:50), 
+AOLT(0:50,0:50),ED(0:50>0:50),IN(0:50,0:50),X(50),UIN(0:50), 
+AAA(50,50) ,CC(0:  50,0: 50) ,SS(0: 50,0: 50) ,UU(0: 50,0: 50) , 
+LLU(0:  50,0:  50)  ,AAOUT(0:  50,0:  50)  ,EED(0:  50,0:  50)  , 
+IINC0:  50,0: 50) ,Y( -3: 50) ,THETA(0: 50) 

INTEGER    I,J,K,M,N,L,T 


c 
c 

**********VARIABLE  DEFINIATION******** 

ED(I,J)   =  FIRST  TRIANGULAR  MATRIX 

c 

EED(I,J)  =  SECOND  TRIANGULAR  MATRIX 

c 
c 

500 

print  Venter  the  size  of  matrix  n  by  n  **n=: 

READ  *,  N 

PRINT  *,'  N  =  ' ,N 

PRINT  *,'  ' 

PRINT  *, 'ENTER  THE  ORDERS  OF  DEN.  OF  DIFF.  EQ. 

READ  *,  NO 

PRINT  -»-,'  NO  =  '  ,N0 

PRINT  *,'  ' 

PRINT  *,'H0W  MANY  BLOCKS  DO  YOU  WANT  TO  ITERATE 

READ  *,  NBN 

PRINT  *,'  NBN  =  ' ,NBN 

PRINT  *,'  ' 

PRINT  *,' ENTER  THE  VALUE  OF  SIGMA  **SIG=?' 

**NO=? ' 


**NBN=? ' 


READ  *,SIG 

PRINT  -•••", 'SIG  =    '  ,SIG 


PRINT  *,'  ' 

IF  (NO.EQ.  1)  THEN 

'FORMAT  OF  1ST  ORDER  DIFF. EQ. **  Y(T)  =  B1*U(T-1)+A1*Y(T-1) 


PRINT  * 
READ  *,A1 
PRINT  * 

PRINT  * 
READ  *,B1 
PRINT  *  ' 
WRITE(* 


ELSEIF  ( 
PRINT  * 
++A2*Y(T 
PRINT  * 
READ  *,A1 


PRINT 

PRINT 
READ  *,A2 
PRINT 


PRINT  * 
READ  *,] 
PRINT  * 
WRITE(* 

ELSEIF  i 

PRINT  * 

++A2*Y(T 

PRINT  * 


READ  *,A1 


PRINT  * 

PRINT  * 
READ  *,A2 
PRINT  * 


A3 


PRINT  * 
READ 
PRINT  * 

PRINT  * 
READ  *,: 
PRINT  * 
WRITE(* 

ELSE 
PRINT  * 
PRINT  * 
GOTO  500 

ENDIF 

NB=15*N0 

Z=l 


ENTER  THE  COEFF.  OF  Al  ?' 
Al  =  ' ,A1 

ENTER  THE  COEFF.  OF  Bl  ?' 
1  =  ' ,B1 


46) 

NO.EQ.  2)  THEN 

'FORMAT  OF  2ND  ORDER  DIFF. EQ. **  Y(T)  =  B1*U(T-1)+A1*Y(T-1) 

2)' 

'ENTER  THE  COEFF.  OF  Al  ?' 


Al  =  ' ,A1 

ENTER  THE  COEFF.  OF  A2  ?' 
A2  =  ' ,A2 
ENTER  THE  COEFF.  OF  Bl  ?' 
1 


'Bl  = 
47) 

NO.EQ.  3)  THEN 

'FORMAT  OF  3RD  ORDER  DIFF. EQ. 

2)+A3*Y(T-3)' 

'ENTER  THE  COEFF.  OF  Al  ?' 


Y(T)  =  B1*U(T-1)+A1*Y(T-1) 


Al  =  ' ,A1 

ENTER  THE  COEFF.  OF  A2  ?' 

'A2  =  ' ,A2 

'ENTER  THE  COEFF.  OF  A3  ?' 

3 

'A3  =  ' ,A3 

'ENTER  THE  COEFF.  OF  Bl  ?' 

1 

'Bl  =  ' ,B1 

48) 


ERROR,  PLEASE  TRY  AGAIN' 
CHOOSE  N0=l,2  OR  3' 


PI=4*ATAN(Z) 

P=PI/10 

BM=0 

R=0.0707 

IX=1 

C       ***-.>**yoV'!V***]3ATA  ENTRANCE************* 

DO  1  LF=2,NO+2 

DO  2  KF=l,NO+2 

IF((LF.  GT.  KF).  AND.  ((LF-KF).EQ.  1))  THEN 
U(LF,KF)=SIG 
ELSE 

U(LF,KF)=0.0 
END  IF 
2       CONTINUE 
1     CONTINUE 

IF(NO.  EQ.  1)THEN 
Y(0)=0.  0 
UIN(0)=0. 0 

Y(1)=A1*Y(0)+B1*UIN(0) 
ELSEIF(NO.EQ.  2)THEN 
Y(-1)=0.0 
Y(0)=0.  0 
UIN(0)=0. 0 

Y( 1)=A2*Y( - 1 )+Al*Y( 0 )+B 1*UIN( 0 ) 
ELSE 

Y(-2)=0.0 
Y(-1)=0.0 
Y(0)=0.0 
UIN(0)=0.  0 

Y(l)=A3*Y(-2)+A2*Y(-l)+Al*Y(0)+Bl*UIN(0) 
END  IF 

DO  3  IT=1, NEN- 
DO 4  J=1,NB 

CALL  GAUSS(IX,R,BM,V) 

IF(NO.EQ.  1)THEN 

U(1,1)=Y(J-1) 

U(1,2)=UIN(J-1) 

U(1,3)=Y(J) 

UIN(J)=SIN(P*J)+SIN(P*2*J)+SIN(P*5*J)+SIN(P*6*J) 

Y(J+1)=A1*Y(J)+B1*UIN(J)+V 

ELSEIF(NO.EQ.  2)THEN 

U(l,l)=Y(J-2) 

U(1,2)=Y(J-1) 

U(1,3)=UIN(J-1) 

U(1,4)=Y(J) 

UINXJ)=SIN(P*J)+SIN(P*2*J)+SIN(P*5*J)+SIN(P*6*J) 

Y(J+1)=A1*Y(J)+A2*Y(J-1)+B1*UIN(J)+V 

ELSE 

U(l,l)=Y(J-3) 

U(l,2)=Y(J-2) 

U(1,3)=Y(J-1) 

U(1,4)=UIN(J-1) 

U(1,5)=Y(J) 

UINXJ)=SINXP*J)+SINXP*2*J)+SIN(P*5*J)+SIN(P*6*J) 

Y(J+l)=Al*Y(J)+A2*Y(J-l)+A3*Y(J-2)+Bl*UIN(J)+V 


ENDIF 
DO  5  11=1,2,3 
DO  6  JJ=1,N+1 
AOUT(II,JJ)=0.0 
U(N+II,JJ)=0.0 
6      CONTINUE 
5    CONTINUE 
C     ********firsT  TRIANGULARIZATION******* 

DO  7  K=1,N 

DO  8  Jl  =  K,N+1 
DO  9  I  =  l,N-K+2 
IF  (Jl.EQ.K)THEN 

CALL  EDGE(U(I,J1),A0UT(I,J1),A0UT(I+1,J1),C(I,J1),S(I,J1)) 
ED(I,J1)=A0UT(I,J1) 
ELSE 

CALL  INTERNAL(U(I,J1),A0UT(I,J1),C(I,J1-1),S(I,J1-1), 
+A0UT(I+1,J1),LU(I,J1),C(I,J1),S(I,J1)) 
U(I-1,J1)=LU(I,J1) 
IN(I,J1)=A0UT(I,J1) 
ENDIF 

9  CONTINUE 
8       CONTINUE 

7     CONTINUE 

DO  10  1=1, N 

DO  11  J2=1,N+1-I 
ED(I,J2)=0.0 

11  CONTINUE 

10  CONTINUE 

DO  12  J3=N+1,2,-1 
DO  13  K=N+3-J3,N+l 
DO  14  I=K,N+1 

ED(I,J3)=IN(I,J3) 

14  CONTINUE 
13      CONTINUE 

12  CONTINUE 
PRINT  *,'  ' 

C     ************DATA  ENTRANCE************* 
C     ******poR  SECOND  TRIANGULAR  ARRAY***** 
DO  15  1=1, N 
DO  16  J4=1,N 

AAA( I , J4)=ED( I+1,N+1-J4) 

16  CONTINUE 

15  CONTINUE 

DO  17  J5=N+l,N+2,2 
DO  18  1=1, N 

AAA(I,J5)=ED(I+1,N+1) 
18      CONTINUE 

17  CONTINUE 

DO  20  1=1,2,3 
DO  21  J6=1,N+1 
AAA(N+I,J6)=0.  0 
21       CONTINUE 
20     CONTINUE 

DO  22  1=1,2,3 
DO  23  J7=1,N+1 


AA0UT(I,J7)=0. 0 

23  CONTINUE 
22    CONTINUE 

DO  24  I=1,N+1 
DO  25  J8=1,N+1 

UU(I,JS)=AAA(I,J8) 

25  CONTINUE 

24  CONTINUE 

C     ********SEC0ND  TRIANGULARIZATION****** 
DO  26  K=1,N 

DO  27  J9  =  K,N+1 
DO  28  I  =  l,N-K+2 
IF  (J9.EQ.  K)THEN 

CALL  EDGE(UU(I,J9),AAOUT(I,J9),AAOUT(I+lJJ9),CC(I,J9),SS(I,J9) 
+) 

EED(I,J9)=AAOUT(I,J9) 
ELSE 

CALL  INTERNAL(UU(IJJ9),AAOUT(I,J9),CC(IsJ9-l),SS(I,J9-l)) 
+AAOUT(I+lJJ9),LLU(I,J9),CC(I,J9),SS(I,J9)) 
UU(I-1,J9)=LLU(I,J9) 
IIN(I,J9)=AAOUT(I,J9) 
ENDIF 

28  CONTINUE 
27      CONTINUE 

26  CONTINUE 
PRINT  * , '  ' 

DO  29  1=1, N 

DO  30  J2=1,N+1-I 
EED(I,J2)=0.0 

30  CONTINUE 

29  CONTINUE 

DO  31  J3=N+1,2,-1 
DO  32  K=N+3-J3,N+l 
DO  33  I=K,N+1 

EED(I,J3)=IN(I,J3) 

33  CONTINUE 
32      CONTINUE 

31  CONTINUE 

DO  34  MM=1,N+1 
DO  35  LL=2,N+1 
UU(LL,HM)=EED(N+3-LL,MM) 
35      CONTINUE 

34  CONTINUE 

IFCNO.EQ.  1)THEN 

IF( UU( 3 , 3) .  EQ.  0.  AND.  UU( 3 , 2) .  EQ.  0)THEN 

THETA(2)=0. 0 

ELSE 

THETA( 2)=UU( 3 , 3)/UU( 3 , 2) 

THETA(1)=(UU(2,3)-(THETA(2)*UU(2,2)))/UU(2J1) 

WRITE(6,*)   THETA(1),THETA(2) 

ENDIF 

ELSEIF(NO.EQ.  2)THEN 

IF(UU(4,4).  EQ.  0.  AND.  UU(4,3).  EQ.  0)THEN 

THETA(3)=0.  0 


ELSE 

THETA(3)=UU(4,4)/UU(4,3) 

THETA(2)=(UU(3,4)-(THETA(3)*UU(3,3)))/UU(3,2) 

THETA(1)=(UU(2,4)-(THETA(3)*UU(2,3)+THETA(2)*UU(2,2)))/UU(2,1) 

WRITE(6,*)   THETA(1),THETA(2),THETA(3) 

END  IF 

ELSEIF(NO.EQ.  3)THEN 

IF(UU(5,5).  EQ.  0.  AND.  UU(5,4).  EQ.  0)THEN 

THETA(4)=0. 0 

ELSE 

THETA(4)=UU(555)/UU(5,4) 

THETA(3)=(UU(4,5)-(THETA(4)'VUU(4,4)))/UU(4,3) 

THETA(2)=(UU(3,5)-(THETA(4)*UU(3,4)+THETA(3)*UU(3,3)))/UU(3,2) 

THETA(1)=(UU(2,5)-(THETA(4)*UU(2,4)+THETA(3)*UU(2,3)+THETA(2)* 
+UU(2,2)))/UU(2,1) 

WRITE (6,*)   THETA(1),THETA(2),THETA(3),THETA(4) 

ENDIF 

END  IF 

CONTINUE 

IF(NO.  EQ.  1)THEN 

Y(0)=Y(NB) 

UIN(0)=UIN(NB) 

Y(1)=Y(NB+1) 

ELSEIF(NO.EQ.  2)THEN 

Y(-1)=Y(NB-1) 

Y(0)=Y(NB) 

UIN(0)=UIN(NB) 

Y(1)=Y(NB+1) 

ELSE 

Y(-2)=Y(NB-2) 

Y(-1)=Y(NB-1) 

Y(0)=Y(NB) 

UIN(0)=UIN(NB) 

Y(1)=Y(NB+1) 

ENDIF 

E 

,17X,'THETA1' ,17X,'THETA2') 
,17X,'THETAl' ,17X,'THETA2' , 18X, 'THETA3' ) 
^X/THETAl'  ,17XJ'THETA2'  ,  18X,  *THETA3'  ,  19X,  'THETA4'  ) 


-.  V  it  -.V  it  it  ■;',-  -.V  it  it  it  it  it  it  it  it  it  it  it  it  it  it  it  it  it  it  it  it 


3 

CONTINUE 

46 

FORMATC ' 

47 

FORMATC ' 

48 

FORMATC ' 

STOP 

END 

C  ********suBROUTINE      GROUPS  FOR******** 

C  ************CELLS  FUNCTION**-********* 

C 

C  ************BOUNDARY  CELL************* 

SUBROUTINE  EDGE   (U, AIN,AOUT,C,S) 

REAL  U,AIN,AOUT,C,S 

IF(U.  EQ.  0.  AND.  AIN.  EQ.  0)THEN 

C=l.  0 

S=0.0 


AOUT=U 

ELSE 

CALL  GAUSS(IX,R,BM,V) 

C=U/(SQRT(U**2.  0D0+AIN**2.  ODO)) 

C=U/(SQRT(U**2.  0D0+AIN**2.  ODO))+V 

S=AIN/(SQRT(U**2.  0D0+AIN**2.  ODO)) 

S=AIN/(SQRT(U**2. 0D0+AIN**2.  ODO))+V 

AOUT=SQRT(U**2.  0D0+AIN**2.  ODO) 

END  IF 

RETURN 

END 

»-*A-AAA*AAAAAA*-AA*AAAAAAAA*AAAAAAAAAAA"A 
AAAAAAAAAAAA  J NTERNAL  CE LL,v,v,w"v,v,VlV*'',"v'*,v 

SUBROUTINE  INTERNAL  (PU,AIN,PC,PS,AOUT,LU,LC,LS) 
REAL  PS,PC,PU,S,C,U,AOUTJLUJAIN,LC,LS 
LU=(-PU*PS)+(AIN*PC) 
AOUT=( PU*PC ) +( AIN*PS ) 
LC=PC 
LS=PS 
RETURN- 
END 


C     ********suBROUTINE  GROUPS 

C      AAAAAAAAAAAGAUSSIAN  NOISE* 

SUBROUTINE  GAUSS  (IX,S,AM,V) 
A=0.  0 

DO  90  1=1,12 
CALL  RANDU(IX,IY,Y) 
IX=IY 
90    A=A+Y 

V=(A-6.0)*S+AM 

RETURN 

END  ' 


SUBROUTINE  RANDU  (IX,IY,YFL) 
IY=IX*65539 

IF  (IY)  5,6,6 

5  IY=IY+2147483647+1 

6  YFL=IY 

YFL=YFL*.  46566 13E -9 

RETURN 

END 

r  ************************************** 
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